57 research outputs found
Emerging from Water: Underwater Image Color Correction Based on Weakly Supervised Color Transfer
Underwater vision suffers from severe effects due to selective attenuation
and scattering when light propagates through water. Such degradation not only
affects the quality of underwater images but limits the ability of vision
tasks. Different from existing methods which either ignore the wavelength
dependency of the attenuation or assume a specific spectral profile, we tackle
color distortion problem of underwater image from a new view. In this letter,
we propose a weakly supervised color transfer method to correct color
distortion, which relaxes the need of paired underwater images for training and
allows for the underwater images unknown where were taken. Inspired by
Cycle-Consistent Adversarial Networks, we design a multi-term loss function
including adversarial loss, cycle consistency loss, and SSIM (Structural
Similarity Index Measure) loss, which allows the content and structure of the
corrected result the same as the input, but the color as if the image was taken
without the water. Experiments on underwater images captured under diverse
scenes show that our method produces visually pleasing results, even
outperforms the art-of-the-state methods. Besides, our method can improve the
performance of vision tasks.Comment: Submitted to IEEE Signal Processing Letter
Zero-Shot Learning via Latent Space Encoding
Zero-Shot Learning (ZSL) is typically achieved by resorting to a class
semantic embedding space to transfer the knowledge from the seen classes to
unseen ones. Capturing the common semantic characteristics between the visual
modality and the class semantic modality (e.g., attributes or word vector) is a
key to the success of ZSL. In this paper, we propose a novel encoder-decoder
approach, namely Latent Space Encoding (LSE), to connect the semantic relations
of different modalities. Instead of requiring a projection function to transfer
information across different modalities like most previous work, LSE per- forms
the interactions of different modalities via a feature aware latent space,
which is learned in an implicit way. Specifically, different modalities are
modeled separately but optimized jointly. For each modality, an encoder-decoder
framework is performed to learn a feature aware latent space via jointly
maximizing the recoverability of the original space from the latent space and
the predictability of the latent space from the original space. To relate
different modalities together, their features referring to the same concept are
enforced to share the same latent codings. In this way, the common semantic
characteristics of different modalities are generalized with the latent
representations. Another property of the proposed approach is that it is easily
extended to more modalities. Extensive experimental results on four benchmark
datasets (AwA, CUB, aPY, and ImageNet) clearly demonstrate the superiority of
the proposed approach on several ZSL tasks, including traditional ZSL,
generalized ZSL, and zero-shot retrieval (ZSR)
Transductive Zero-Shot Learning with Adaptive Structural Embedding
Zero-shot learning (ZSL) endows the computer vision system with the
inferential capability to recognize instances of a new category that has never
seen before. Two fundamental challenges in it are visual-semantic embedding and
domain adaptation in cross-modality learning and unseen class prediction steps,
respectively. To address both challenges, this paper presents two corresponding
methods named Adaptive STructural Embedding (ASTE) and Self-PAsed Selective
Strategy (SPASS), respectively. Specifically, ASTE formulates the
visualsemantic interactions in a latent structural SVM framework to adaptively
adjust the slack variables to embody the different reliableness among training
instances. In this way, the reliable instances are imposed with small
punishments, wheras the less reliable instances are imposed with more severe
punishments. Thus, it ensures a more discriminative embedding. On the other
hand, SPASS offers a framework to alleviate the domain shift problem in ZSL,
which exploits the unseen data in an easy to hard fashion. Particularly, SPASS
borrows the idea from selfpaced learning by iteratively selecting the unseen
instances from reliable to less reliable to gradually adapt the knowledge from
the seen domain to the unseen domain. Subsequently, by combining SPASS and
ASTE, we present a self-paced Transductive ASTE (TASTE) method to progressively
reinforce the classification capacity. Extensive experiments on three benchmark
datasets (i.e., AwA, CUB, and aPY) demonstrate the superiorities of ASTE and
TASTE. Furthermore, we also propose a fast training (FT) strategy to improve
the efficiency of most of existing ZSL methods. The FT strategy is surprisingly
simple and general enough, which can speed up the training time of most
existing methods by 4~300 times while holding the previous performance
DR-Net: Transmission Steered Single Image Dehazing Network with Weakly Supervised Refinement
Despite the recent progress in image dehazing, several problems remain
largely unsolved such as robustness for varying scenes, the visual quality of
reconstructed images, and effectiveness and flexibility for applications. To
tackle these problems, we propose a new deep network architecture for single
image dehazing called DR-Net. Our model consists of three main subnetworks: a
transmission prediction network that predicts transmission map for the input
image, a haze removal network that reconstructs latent image steered by the
transmission map, and a refinement network that enhances the details and color
properties of the dehazed result via weakly supervised learning. Compared to
previous methods, our method advances in three aspects: (i) pure data-driven
model; (ii) the end-to-end system; (iii) superior robustness, accuracy, and
applicability. Extensive experiments demonstrate that our DR-Net outperforms
the state-of-the-art methods on both synthetic and real images in qualitative
and quantitative metrics. Additionally, the utility of DR-Net has been
illustrated by its potential usage in several important computer vision tasks.Comment: 8 pages, 8 figures, submitted to CVPR 201
Semantic Softmax Loss for Zero-Shot Learning
A typical pipeline for Zero-Shot Learning (ZSL) is to integrate the visual
features and the class semantic descriptors into a multimodal framework with a
linear or bilinear model. However, the visual features and the class semantic
descriptors locate in different structural spaces, a linear or bilinear model
can not capture the semantic interactions between different modalities well. In
this letter, we propose a nonlinear approach to impose ZSL as a multi-class
classification problem via a Semantic Softmax Loss by embedding the class
semantic descriptors into the softmax layer of multi-class classification
network. To narrow the structural differences between the visual features and
semantic descriptors, we further use an L2 normalization constraint to the
differences between the visual features and visual prototypes reconstructed
with the semantic descriptors. The results on three benchmark datasets, i.e.,
AwA, CUB and SUN demonstrate the proposed approach can boost the performances
steadily and achieve the state-of-the-art performance for both zero-shot
classification and zero-shot retrieval
A Cascaded Convolutional Neural Network for Single Image Dehazing
Images captured under outdoor scenes usually suffer from low contrast and
limited visibility due to suspended atmospheric particles, which directly
affects the quality of photos. Despite numerous image dehazing methods have
been proposed, effective hazy image restoration remains a challenging problem.
Existing learning-based methods usually predict the medium transmission by
Convolutional Neural Networks (CNNs), but ignore the key global atmospheric
light. Different from previous learning-based methods, we propose a flexible
cascaded CNN for single hazy image restoration, which considers the medium
transmission and global atmospheric light jointly by two task-driven
subnetworks. Specifically, the medium transmission estimation subnetwork is
inspired by the densely connected CNN while the global atmospheric light
estimation subnetwork is a light-weight CNN. Besides, these two subnetworks are
cascaded by sharing the common features. Finally, with the estimated model
parameters, the haze-free image is obtained by the atmospheric scattering model
inversion, which achieves more accurate and effective restoration performance.
Qualitatively and quantitatively experimental results on the synthetic and
real-world hazy images demonstrate that the proposed method effectively removes
haze from such images, and outperforms several state-of-the-art dehazing
methods.Comment: This manuscript is accepted by IEEE ACCES
Stacked Semantic-Guided Attention Model for Fine-Grained Zero-Shot Learning
Zero-Shot Learning (ZSL) is achieved via aligning the semantic relationships
between the global image feature vector and the corresponding class semantic
descriptions. However, using the global features to represent fine-grained
images may lead to sub-optimal results since they neglect the discriminative
differences of local regions. Besides, different regions contain distinct
discriminative information. The important regions should contribute more to the
prediction. To this end, we propose a novel stacked semantics-guided attention
(S2GA) model to obtain semantic relevant features by using individual class
semantic features to progressively guide the visual features to generate an
attention map for weighting the importance of different local regions. Feeding
both the integrated visual features and the class semantic features into a
multi-class classification architecture, the proposed framework can be trained
end-to-end. Extensive experimental results on CUB and NABird datasets show that
the proposed approach has a consistent improvement on both fine-grained
zero-shot classification and retrieval tasks
Bi-Adversarial Auto-Encoder for Zero-Shot Learning
Existing generative Zero-Shot Learning (ZSL) methods only consider the
unidirectional alignment from the class semantics to the visual features while
ignoring the alignment from the visual features to the class semantics, which
fails to construct the visual-semantic interactions well. In this paper, we
propose to synthesize visual features based on an auto-encoder framework paired
with bi-adversarial networks respectively for visual and semantic modalities to
reinforce the visual-semantic interactions with a bi-directional alignment,
which ensures the synthesized visual features to fit the real visual
distribution and to be highly related to the semantics. The encoder aims at
synthesizing real-like visual features while the decoder forces both the real
and the synthesized visual features to be more related to the class semantics.
To further capture the discriminative information of the synthesized visual
features, both the real and synthesized visual features are forced to be
classified into the correct classes via a classification network. Experimental
results on four benchmark datasets show that the proposed approach is
particularly competitive on both the traditional ZSL and the generalized ZSL
tasks
Transductive Zero-Shot Learning with a Self-training dictionary approach
As an important and challenging problem in computer vision, zero-shot
learning (ZSL) aims at automatically recognizing the instances from unseen
object classes without training data. To address this problem, ZSL is usually
carried out in the following two aspects: 1) capturing the domain distribution
connections between seen classes data and unseen classes data; and 2) modeling
the semantic interactions between the image feature space and the label
embedding space. Motivated by these observations, we propose a bidirectional
mapping based semantic relationship modeling scheme that seeks for crossmodal
knowledge transfer by simultaneously projecting the image features and label
embeddings into a common latent space. Namely, we have a bidirectional
connection relationship that takes place from the image feature space to the
latent space as well as from the label embedding space to the latent space. To
deal with the domain shift problem, we further present a transductive learning
approach that formulates the class prediction problem in an iterative refining
process, where the object classification capacity is progressively reinforced
through bootstrapping-based model updating over highly reliable instances.
Experimental results on three benchmark datasets (AwA, CUB and SUN) demonstrate
the effectiveness of the proposed approach against the state-of-the-art
approaches
UIEC^2-Net: CNN-based Underwater Image Enhancement Using Two Color Space
Underwater image enhancement has attracted much attention due to the rise of
marine resource development in recent years. Benefit from the powerful
representation capabilities of Convolution Neural Networks(CNNs), multiple
underwater image enhancement algorithms based on CNNs have been proposed in the
last few years. However, almost all of these algorithms employ RGB color space
setting, which is insensitive to image properties such as luminance and
saturation. To address this problem, we proposed Underwater Image Enhancement
Convolution Neural Network using 2 Color Space (UICE^2-Net) that efficiently
and effectively integrate both RGB Color Space and HSV Color Space in one
single CNN. To our best knowledge, this method is the first to use HSV color
space for underwater image enhancement based on deep learning. UIEC^2-Net is an
end-to-end trainable network, consisting of three blocks as follow: a RGB
pixel-level block implements fundamental operations such as denoising and
removing color cast, a HSV global-adjust block for globally adjusting
underwater image luminance, color and saturation by adopting a novel neural
curve layer, and an attention map block for combining the advantages of RGB and
HSV block output images by distributing weight to each pixel. Experimental
results on synthetic and real-world underwater images show the good performance
of our proposed method in both subjective comparisons and objective metrics.
The code are available at https://github.com/BIGWangYuDong/UWEnhancement.Comment: 11 pages, 11 figure
- …